Fix parsing of numbers of Locales with "non-standard"-grouping and/or decimal separators #3778

kimmerin · 2024-02-05T18:53:54Z

Hi,

as discussed in the forum, if numbers are formatted e.g. with the Swiss Locale, they can't be parsed. This happens for regular and limit price values. I've fixed this issue by creating a regex pattern using the actual values from DecimalFormatSymbols and refactored the classes a bit to be able to put tests around them

getTextColor to find a useful color in dependence of the background. Changed the threshold of getTextColor because "pure" green is way to bright for a white foreground color to be readable

e.g. de_CH

buchen

Thanks for tackling this problem.

BTW, it would greatly help me if you create a new branch for every pull request against the portfolio/master branch. This branch included commits from previous pull requests that I had merged in - but sometimes I rebase the commit, sometimes I slightly edit the commit. Therefore I had those left-overs here.

buchen · 2024-02-15T20:43:03Z

name.abuchen.portfolio/src/name/abuchen/portfolio/model/AttributeType.java

-    private static final Pattern PATTERN = Pattern.compile("^([\\d.,-]*)$"); //$NON-NLS-1$
-    private static final Pattern LIMIT_PRICE_PATTERN = Pattern.compile("^\\s*(<=?|>=?)\\s*([0-9,.']+)$"); //$NON-NLS-1$
+    private static Pattern PATTERN;
+    private static Pattern LIMIT_PRICE_PATTERN;
+
+    static
+    {
+        initPatterns();
+    }
+
+    @SuppressWarnings("nls")
+    static void initPatterns()
+    {
+        DecimalFormatSymbols dfs = DecimalFormatSymbols.getInstance(Locale.getDefault(Locale.Category.FORMAT));
+
+        // e.g. \\-?(\\d+\\.)*\\d+(\\,\\d+)?
+        String numberRegex = "\\" + dfs.getMinusSign() + "?(\\d+\\" + dfs.getGroupingSeparator() + ")*\\d+(\\"
+                        + dfs.getDecimalSeparator() + "\\d+)?";
+        PATTERN = Pattern.compile("^\\s*" + numberRegex + "\\s*$");
+        LIMIT_PRICE_PATTERN = Pattern.compile("^\\s*(<=?|>=?)\\s*(" + numberRegex + ")\\s*$");
+    }


Let me try to understand the motivation for this change.

The current code first applies a pattern to check if it is kind of a normal number (and to find the indicator '>', '<', etc. for the limit price) but then parses the value the a DecimalFormat build from the current locale. If either of those fail, then an error is thrown.

To be honest, I can't remember why the code is the way it is. Because one could argue that the DecimalFormat is enough - if it parses the number, then it is a number (maybe check if there are no remaining characters left in the string).

Therefore, instead of making the pattern more complex, I wonder if we either get rid of the pattern at all or just add the grouping separator - say ' and a blank . Is your concern that there are unknown separators for obscure locales?

private static final Pattern PATTERN = Pattern.compile("^([\\d.,-' ]*)$"); //$NON-NLS-1$

My theory about this particular logic is that checking the value to be parsed against a regex makes sure that you don't successfully parse values as correct while they aren't correct, e.g. an english formatted 1,234.56 with a german Locale. Have to check but I think the result would be 1.23456. Fun fact: The original regex wouldn't catch this problem ;-)

Of course, if it can be removed alltogether, I can remove it but my motivation was to fix the issue without changing any preexisting logic.

Therefore, instead of making the pattern more complex, I wonder if we either get rid of the pattern at all or just add the grouping separator - say ' and a blank . Is your concern that there are unknown separators for obscure locales?

I'd say: Do a correct check (requiring the more complex regex I've created) or leave it away. But that's no my decision to make ;-)

To be honest, I can't remember why the code is the way it is. Because one could argue that the DecimalFormat is enough

After further thinking about this: DecimalFormat would also successfully parse formats using the exp-syntax like 1.2345E2. The check against the regex prevents that from happening. Maybe that was the motivation "back then" and might be a reason in favor of keeping the check today.

Maybe that was the motivation "back then" and might be a reason in favor of keeping the check today.

I am not sure what my motivation was back then. Maybe a naive thinking that number parsing can be easier...

just on example: our friends in Belgium use the dot as decimal separator even though that is not the standard separator in the locale. Why? Because the dot is easier to reach on the keyboard. Even Excel supports this - I checked it.

But anyway, I think your code is as good as it gets for parsing a locale-specific number based on the configured symbols.

The more I think about it, maybe we should use the StringToCurrencyConverter here as well. That would make the parsing consistent to other input fields.

buchen · 2024-02-15T20:45:30Z

name.abuchen.portfolio/src/name/abuchen/portfolio/money/Values.java

-        private static final String QUOTE_PATTERN = "#,##0.00######"; //$NON-NLS-1$
+        static final String QUOTE_PATTERN = "#,##0.00######"; //$NON-NLS-1$

-        private static final ThreadLocal<DecimalFormat> QUOTE_FORMAT = ThreadLocal // NOSONAR
+        static final ThreadLocal<DecimalFormat> QUOTE_FORMAT = ThreadLocal // NOSONAR


Do I understand it correct, that this change is made to facility testing? Because you need to change the locale, but the DecimalFormat in the ThreadLocal also needs to change? If so, then I think an explicit method is also okay. If we annotate it with @VisibleForTesting then the IDE even provides hints in case we accidentally want to use it.

I've made the change to be able to change the Locale during testing in order to replace the instance of DecimalFormat in QUOTE_FORMAT after it has changed. Concerning a method: I've implemented one in ValuesBuilder.initQuoteValuesDecimalFormat which is why I've changed the visibility of these two members in the first place. So I'm not sure if another method in non-test-code is necessary.

buchen · 2024-02-15T20:46:02Z

name.abuchen.portfolio.tests/src/name/abuchen/portfolio/money/ValuesBuilder.java

+        };
+    }
+
+    public static <T extends Number> Values<T> createNumberValues(String pattern, int precision)


I think this (test) method is not used anywhere in the code

Correct, I've created it and then realized that the code that is involved in the parsing of monetary values didn't use the parse-methods in the Values-instance but above DecimalFormat in QUOTE_FORMAT. I've left the method in because it might come handy in future testcases where this might be needed.

kimmerin · 2024-02-25T17:47:27Z

Hi @buchen,

are there any open questions or issues with this PR that I need to address? Concerning the issue with the "left over" class of another PR, can this be solved from your side by ignoring that class when merging the changes into the code base? I'm not that git-savvy and am concerned that by trying to get this sorted out on my side, this will end in misery.

buchen · 2024-03-05T07:25:42Z

are there any open questions or issues with this PR that I need to address?

I was just busy with other things. I will pick it up now.

The attribute implementation uses a pattern to check for valid numbers before parsing the number. The pattern did not take local dependent thousands and decimal separator into account. With this change, the pattern is created based on the given locale. To enable testing, the pattern is created when the converter is created. Because converters are created only once per attribute, this should not be a performance problem. Issue: #3778 Co-authored-by: Lothar <github@kimmeringer.de> Co-authored-by: Andreas Buchen <andreas.buchen@gmail.com>

buchen · 2024-03-05T09:08:09Z

Merged.

I decided to get rid of the static pattern creation. If it is not static, then the test can just instantiate a class and the pattern will use the right locale. I do not think that is a performance problem, because the AttributeType is only instantiated once and therefore the converter is only instantiated once. So it is not like we have now thousands of Pattern instantiations.

And I decided to not test the toString methods. Why? Because it makes the code really messy due to the static Values.*. Of course, today I would not use such pattern anymore, but that's the way it is. Plus: it couples the formatting with the definition of digits which also does not really make sense. Anyway, the tests are valuable because they ensure proper parsing.

Thanks for the change. And thanks for your patience. And thanks for the new test case - that made it much easier to do the change.

kimmerin · 2024-03-09T17:50:07Z

Thanks for the change. And thanks for your patience. And thanks for the new test case - that made it much easier to do the change.

No problem and that's what unit tests are for ;-)

kimmerin added 5 commits February 1, 2024 19:09

Set foreground color for the limit exceeded widget as well and use

973c672

getTextColor to find a useful color in dependence of the background. Changed the threshold of getTextColor because "pure" green is way to bright for a white foreground color to be readable

added a test as starting point for more

136b9dd

removed system out println

be5013d

Fixed the parsing of numbers with "non-standard" formatting Locales,

075b868

e.g. de_CH

removed (again)

a557a97

kimmerin changed the title ~~Fix parsing of nubers of Locales with "non-standard"-grouping and/or decimal separators~~ Fix parsing of numbers of Locales with "non-standard"-grouping and/or decimal separators Feb 5, 2024

call init-methods in setup as well

96ae5c4

kimmerin force-pushed the master branch from d94ea6f to 96ae5c4 Compare February 5, 2024 19:00

buchen reviewed Feb 15, 2024

View reviewed changes

buchen self-assigned this Feb 15, 2024

buchen added the bug label Feb 15, 2024

buchen closed this Mar 5, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix parsing of numbers of Locales with "non-standard"-grouping and/or decimal separators #3778

Fix parsing of numbers of Locales with "non-standard"-grouping and/or decimal separators #3778

kimmerin commented Feb 5, 2024

buchen left a comment

buchen Feb 15, 2024 •

edited

kimmerin Feb 15, 2024

kimmerin Feb 15, 2024

kimmerin Feb 16, 2024 •

edited

buchen Mar 5, 2024

buchen Mar 5, 2024

buchen Feb 15, 2024

kimmerin Feb 15, 2024

buchen Feb 15, 2024

kimmerin Feb 15, 2024

kimmerin commented Feb 25, 2024

buchen commented Mar 5, 2024

buchen commented Mar 5, 2024

kimmerin commented Mar 9, 2024

Fix parsing of numbers of Locales with "non-standard"-grouping and/or decimal separators #3778

Fix parsing of numbers of Locales with "non-standard"-grouping and/or decimal separators #3778

Conversation

kimmerin commented Feb 5, 2024

buchen left a comment

Choose a reason for hiding this comment

buchen Feb 15, 2024 • edited

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

kimmerin Feb 16, 2024 • edited

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

kimmerin commented Feb 25, 2024

buchen commented Mar 5, 2024

buchen commented Mar 5, 2024

kimmerin commented Mar 9, 2024

buchen Feb 15, 2024 •

edited

kimmerin Feb 16, 2024 •

edited